70 research outputs found
Reinforcement Learning in Education: A Multi-Armed Bandit Approach
Advances in reinforcement learning research have demonstrated the ways in
which different agent-based models can learn how to optimally perform a task
within a given environment. Reinforcement leaning solves unsupervised problems
where agents move through a state-action-reward loop to maximize the overall
reward for the agent, which in turn optimizes the solving of a specific problem
in a given environment. However, these algorithms are designed based on our
understanding of actions that should be taken in a real-world environment to
solve a specific problem. One such problem is the ability to identify,
recommend and execute an action within a system where the users are the
subject, such as in education. In recent years, the use of blended learning
approaches integrating face-to-face learning with online learning in the
education context, has in-creased. Additionally, online platforms used for
education require the automation of certain functions such as the
identification, recommendation or execution of actions that can benefit the
user, in this sense, the student or learner. As promising as these scientific
advances are, there is still a need to conduct research in a variety of
different areas to ensure the successful deployment of these agents within
education systems. Therefore, the aim of this study was to contextualise and
simulate the cumulative reward within an environment for an intervention
recommendation problem in the education context.Comment: 17 pages, 6 figures, 1 table, EAI AFRICATEK 2022 Conferenc
Learning domain abstractions for long lived robots
Recent trends in robotics have seen more general purpose robots being deployed in
unstructured environments for prolonged periods of time. Such robots are expected to
adapt to different environmental conditions, and ultimately take on a broader range of
responsibilities, the specifications of which may change online after the robot has been
deployed.
We propose that in order for a robot to be generally capable in an online sense
when it encounters a range of unknown tasks, it must have the ability to continually
learn from a lifetime of experience. Key to this is the ability to generalise from experiences
and form representations which facilitate faster learning of new tasks, as well as
the transfer of knowledge between different situations. However, experience cannot be
managed našıvely: one does not want constantly expanding tables of data, but instead
continually refined abstractions of the data â much like humans seem to abstract and
organise knowledge. If this agent is active in the same, or similar, classes of environments
for a prolonged period of time, it is provided with the opportunity to build
abstract representations in order to simplify the learning of future tasks. The domain
is a common structure underlying large families of tasks, and exploiting this affords
the agent the potential to not only minimise relearning from scratch, but over time to
build better models of the environment. We propose to learn such regularities from the
environment, and extract the commonalities between tasks.
This thesis aims to address the major question: what are the domain invariances
which should be learnt by a long lived agent which encounters a range of different
tasks? This question can be decomposed into three dimensions for learning invariances,
based on perception, action and interaction. We present novel algorithms for
dealing with each of these three factors.
Firstly, how does the agent learn to represent the structure of the world? We focus
here on learning inter-object relationships from depth information as a concise
representation of the structure of the domain. To this end we introduce contact point
networks as a topological abstraction of a scene, and present an algorithm based on
support vector machine decision boundaries for extracting these from three dimensional
point clouds obtained from the agentâs experience of a domain. By reducing the
specific geometry of an environment into general skeletons based on contact between
different objects, we can autonomously learn predicates describing spatial relationships.
Secondly, how does the agent learn to acquire general domain knowledge? While
the agent attempts new tasks, it requires a mechanism to control exploration, particularly
when it has many courses of action available to it. To this end we draw on the fact
that many local behaviours are common to different tasks. Identifying these amounts
to learning âcommon senseâ behavioural invariances across multiple tasks. This principle
leads to our concept of action priors, which are defined as Dirichlet distributions
over the action set of the agent. These are learnt from previous behaviours, and expressed
as the prior probability of selecting each action in a state, and are used to guide
the learning of novel tasks as an exploration policy within a reinforcement learning
framework.
Finally, how can the agent react online with sparse information? There are times
when an agent is required to respond fast to some interactive setting, when it may have
encountered similar tasks previously. To address this problem, we introduce the notion
of types, being a latent class variable describing related problem instances. The agent
is required to learn, identify and respond to these different types in online interactive
scenarios. We then introduce Bayesian policy reuse as an algorithm that involves maintaining
beliefs over the current task instance, updating these from sparse signals, and
selecting and instantiating an optimal response from a behaviour library.
This thesis therefore makes the following contributions. We provide the first algorithm
for autonomously learning spatial relationships between objects from point
cloud data. We then provide an algorithm for extracting action priors from a set of
policies, and show that considerable gains in speed can be achieved in learning subsequent
tasks over learning from scratch, particularly in reducing the initial losses associated
with unguided exploration. Additionally, we demonstrate how these action priors
allow for safe exploration, feature selection, and a method for analysing and advising
other agentsâ movement through a domain. Finally, we introduce Bayesian policy
reuse which allows an agent to quickly draw on a library of policies and instantiate the
correct one, enabling rapid online responses to adversarial conditions
On The Specialization of Neural Modules
A number of machine learning models have been proposed with the goal of achieving systematic generalization: the ability to reason about new situations by combining aspects of previous experiences. These models leverage compositional
architectures which aim to learn specialized modules dedicated to structures in a
task that can be composed to solve novel problems with similar structures. While
the compositionality of these architectures is guaranteed by design, the modules
specializing is not. Here we theoretically study the ability of network modules
to specialize to useful structures in a dataset and achieve systematic generalization. To this end we introduce a minimal space of datasets motivated by practical
systematic generalization benchmarks. From this space of datasets we present a
mathematical definition of systematicity and study the learning dynamics of linear
neural modules when solving components of the task. Our results shed light on the
difficulty of module specialization, what is required for modules to successfully
specialize, and the necessity of modular architectures to achieve systematicity.
Finally, we confirm that the theoretical results in our tractable setting generalize to
more complex datasets and non-linear architectures
FABRIC: A Framework for the Design and Evaluation of Collaborative Robots with Extended Human Adaptation
A limitation for collaborative robots (cobots) is their lack of ability to
adapt to human partners, who typically exhibit an immense diversity of
behaviors. We present an autonomous framework as a cobot's real-time
decision-making mechanism to anticipate a variety of human characteristics and
behaviors, including human errors, toward a personalized collaboration. Our
framework handles such behaviors in two levels: 1) short-term human behaviors
are adapted through our novel Anticipatory Partially Observable Markov Decision
Process (A-POMDP) models, covering a human's changing intent (motivation),
availability, and capability; 2) long-term changing human characteristics are
adapted by our novel Adaptive Bayesian Policy Selection (ABPS) mechanism that
selects a short-term decision model, e.g., an A-POMDP, according to an estimate
of a human's workplace characteristics, such as her expertise and collaboration
preferences. To design and evaluate our framework over a diversity of human
behaviors, we propose a pipeline where we first train and rigorously test the
framework in simulation over novel human models. Then, we deploy and evaluate
it on our novel physical experiment setup that induces cognitive load on humans
to observe their dynamic behaviors, including their mistakes, and their
changing characteristics such as their expertise. We conduct user studies and
show that our framework effectively collaborates non-stop for hours and adapts
to various changing human behaviors and characteristics in real-time. That
increases the efficiency and naturalness of the collaboration with a higher
perceived collaboration, positive teammate traits, and human trust. We believe
that such an extended human adaptation is key to the long-term use of cobots.Comment: The article is in review for publication in International Journal of
Robotics Researc
- âŠ